Introduction
The given dataset contains yearly population data for 237 countries spanning 72 years from 1950 to 2022. The data contains 17,301 rows with 8 columns. The columns include the ISO3 code, name of the country, year, total population(recorded as of July 1st), male and female population, calculated ratio of male:female, and the median age of each population.
To analyse the data, we used three major visualisations. The first plot is a choropleth map that shows the median age of each country from 1950 to 2022, which lets us examine regional based trends in median age over time. The second plot is a scatterplot that examines the relationship between population change and median age, with a slider that allows us to view data for individual years between 1950 and 2022. Finally, we have visualized the sex ratio of the top and bottom five countries in 2012 and 2022, providing insights into how gender demographics have shifted over time.
Our aim is to analyse the data and uncover insights into trends related to median age, population sex ratio and population change over time. Through visual analysis we aim to identify significant patterns across nations and to investigate possible causes for these effects.
Using these visualisations, we hope to gain a better understanding of the development and change in population dynamics in regards to these aforementioned variables.
Data Adjustments
Created a Continent column so can group by Continents for further analysis.
Kosovo is not universally recognized so had to manually add the continent for Kosovo.
Simplified some the of the provided variable names.
For the purpose of calculating the population change (total, male, and female) of a country from one year to the one before, three columns were created.
Created a Rank column for Sex Ratio to use it for further analysis.
load("pop.Rdata")
library(tidyverse)
library(fpp3)
#install.packages("mapCountryData")
#library(mapCountryData)
library(countrycode)
library(dplyr)
library(ggrepel)
library(rnaturalearthdata)
library(ggplot2)
library(plotly)
library(ggiraph)
library(patchwork) # for combining plots p1+p1
library(ggiraph)
library(sf)
library(rnaturalearth)
library(viridis)
library(gapminder)
library(crosstalk)
library(tidyr)
Question 1
What are the regional and country-level variations in median age across the world, and what factors might be contributing to these differences?
Plot
#setwd("/users/students/19505453/My folder/st302/Project")
#loading in the dataset
#load("~/My folder/st302/Project/pop.Rdata")
##pop <- pop
#create world map template
world_map <- ne_countries(scale = "small", returnclass = "sf")
#select data necessary for the plot only. Rename column ISO3 to match the world map template.
med_data <- pop %>%
select(ISO3_code, Location, Time, MedianAgePop) %>%
rename(iso_a3_eh=ISO3_code )
#for specific time frame viewing, takes less time to load
# for(i in c(1950:2000)) {
# med_data <- med_data %>%
# filter(!Time == i)
# }
#merge the data to add median age to world map
merged_data_med <- merge(world_map, med_data, by = "iso_a3_eh")
#
# view(world_map)
#create ggplot
#adjust lwd to better see smaller countries.
#change colour scheme direction from light to dark
#add tooltip argument to input data from dataset location
p_med <- ggplot(,aes(frame = Time)) +
geom_sf(data = merged_data_med,
aes(fill = MedianAgePop,
text = paste0("Country: ",name,sep = "\n",
"Median Age:", MedianAgePop,sep = "\n",
"Year:", Time)),
lwd = 0.1,
color = "black") +
scale_fill_viridis(direction = -1) +
xlab("Longitude") +
ylab("Latitude") +
ggtitle("Median Age across the globe",
subtitle = "237 countries.") +
theme_bw()+
theme(panel.background = element_rect(fill = "aliceblue"))
#assign the ggplotly plot to a variable name
pmedotly <- ggplotly(p_med, tooltip = c("colour","text"))
#add arguments to plotly, prevent animation between frames(animation centres and redistributes the points(does not look good))
pmedotly %>%
animation_opts(transition = 0,frame = list(duration = 5), mode = "immediate", easing = NULL) %>%
style(hoveron = "fill")